Singapore¶

Background: The island city-state on the southern tip of the Malay Peninsula has seen many rulers in its history. It has been an outpost of the ancient Sumatran Srivijaya empire; it was part of the Sultanate of Johore in the 15th and 16th centuries, and it was burned down by Portuguese forces in 1617. At the beginning of the 19th century, Singapore became a trade post and settlement of the British East India Company and a British crown colony some years later.

Singapore is today (beside Monaco) the most densely populated independent country in the world. Its strategic location at the Strait of Malacca and the South China Sea, combined with a mostly corruption-free government, a skilled workforce, pro-foreign investment and export-oriented, led to a thriving free-market economy that attracts international investment funds on a large scale despite its relatively high-cost operating environment.

Begin¶

In [1]:
#  !python library_conso.py
#  import library_conso  # quick import 
import numpy as np
import pandas as pd
pd.set_option('display.float_format', '{:.2f}'.format) # ? or change
import seaborn as sns
import matplotlib.pyplot as plt
import requests
import pydrive
import pandas_profiling
from pandas_profiling import ProfileReport
import glob  # trying to automate it 
import numpy as np
from bokeh.layouts import column
from bokeh.models import Div, TeX
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from sklearn.preprocessing import LabelEncoder
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
%matplotlib inline
output_notebook()
Loading BokehJS ...
In [2]:
# !dir ..\data\all_hpi
#  03/11/2022  11:05 AM            12,287 metadata-resale-flat-prices.txt
#  03/11/2022  11:05 AM        22,922,820 resale-flat-prices-based-on-approval-date-1990-1999.csv
#  03/11/2022  11:05 AM        29,739,597 resale-flat-prices-based-on-approval-date-2000-feb-2012.csv
#  03/11/2022  11:05 AM         3,108,078 resale-flat-prices-based-on-registration-date-from-jan-2015-to-dec-2016.csv
#  03/11/2022  11:05 AM        11,978,170 resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv
#  03/11/2022  11:05 AM         4,212,975 resale-flat-prices-based-on-registration-date-from-mar-2012-to-dec-2014.csv    
# 
# could use glob here, but i aldo do care about the timestamps...
# i also have this stored on S3 for direct access instead of local 
df1 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-approval-date-1990-1999.csv')
df2 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-approval-date-2000-feb-2012.csv')
df3 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-registration-date-from-mar-2012-to-dec-2014.csv')
df4 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-registration-date-from-jan-2015-to-dec-2016.csv')
df5 = pd.read_csv(r'D:\GITHUB_Repos\Singapore\ENTER\data\all_hpi\resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv')
In [3]:
df1.head(); df2.head(); df3.head(); df4.head(); df5.head()   # yes, some data csv dont have remaining_lease col... 
Out[3]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price
0 1990-01 ANG MO KIO 1 ROOM 309 ANG MO KIO AVE 1 10 TO 12 31.00 IMPROVED 1977 9000
1 1990-01 ANG MO KIO 1 ROOM 309 ANG MO KIO AVE 1 04 TO 06 31.00 IMPROVED 1977 6000
2 1990-01 ANG MO KIO 1 ROOM 309 ANG MO KIO AVE 1 10 TO 12 31.00 IMPROVED 1977 8000
3 1990-01 ANG MO KIO 1 ROOM 309 ANG MO KIO AVE 1 07 TO 09 31.00 IMPROVED 1977 6000
4 1990-01 ANG MO KIO 3 ROOM 216 ANG MO KIO AVE 1 04 TO 06 73.00 NEW GENERATION 1976 47200
Out[3]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price
0 2000-01 ANG MO KIO 3 ROOM 170 ANG MO KIO AVE 4 07 TO 09 69.00 Improved 1986 147000.00
1 2000-01 ANG MO KIO 3 ROOM 174 ANG MO KIO AVE 4 04 TO 06 61.00 Improved 1986 144000.00
2 2000-01 ANG MO KIO 3 ROOM 216 ANG MO KIO AVE 1 07 TO 09 73.00 New Generation 1976 159000.00
3 2000-01 ANG MO KIO 3 ROOM 215 ANG MO KIO AVE 1 07 TO 09 73.00 New Generation 1976 167000.00
4 2000-01 ANG MO KIO 3 ROOM 218 ANG MO KIO AVE 1 07 TO 09 67.00 New Generation 1976 163000.00
Out[3]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price
0 2012-03 ANG MO KIO 2 ROOM 172 ANG MO KIO AVE 4 06 TO 10 45.00 Improved 1986 250000.00
1 2012-03 ANG MO KIO 2 ROOM 510 ANG MO KIO AVE 8 01 TO 05 44.00 Improved 1980 265000.00
2 2012-03 ANG MO KIO 3 ROOM 610 ANG MO KIO AVE 4 06 TO 10 68.00 New Generation 1980 315000.00
3 2012-03 ANG MO KIO 3 ROOM 474 ANG MO KIO AVE 10 01 TO 05 67.00 New Generation 1984 320000.00
4 2012-03 ANG MO KIO 3 ROOM 604 ANG MO KIO AVE 5 06 TO 10 67.00 New Generation 1980 321000.00
Out[3]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease resale_price
0 2015-01 ANG MO KIO 3 ROOM 174 ANG MO KIO AVE 4 07 TO 09 60.00 Improved 1986 70 255000.00
1 2015-01 ANG MO KIO 3 ROOM 541 ANG MO KIO AVE 10 01 TO 03 68.00 New Generation 1981 65 275000.00
2 2015-01 ANG MO KIO 3 ROOM 163 ANG MO KIO AVE 4 01 TO 03 69.00 New Generation 1980 64 285000.00
3 2015-01 ANG MO KIO 3 ROOM 446 ANG MO KIO AVE 10 01 TO 03 68.00 New Generation 1979 63 290000.00
4 2015-01 ANG MO KIO 3 ROOM 557 ANG MO KIO AVE 10 07 TO 09 68.00 New Generation 1980 64 290000.00
Out[3]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date remaining_lease resale_price
0 2017-01 ANG MO KIO 2 ROOM 406 ANG MO KIO AVE 10 10 TO 12 44.00 Improved 1979 61 years 04 months 232000.00
1 2017-01 ANG MO KIO 3 ROOM 108 ANG MO KIO AVE 4 01 TO 03 67.00 New Generation 1978 60 years 07 months 250000.00
2 2017-01 ANG MO KIO 3 ROOM 602 ANG MO KIO AVE 5 01 TO 03 67.00 New Generation 1980 62 years 05 months 262000.00
3 2017-01 ANG MO KIO 3 ROOM 465 ANG MO KIO AVE 10 04 TO 06 68.00 New Generation 1980 62 years 01 month 265000.00
4 2017-01 ANG MO KIO 3 ROOM 601 ANG MO KIO AVE 5 01 TO 03 67.00 New Generation 1980 62 years 05 months 265000.00
In [4]:
df = df1.append(df2).append(df3).append(df4).append(df5).reset_index(drop=True)  # .sort_values('whatever')
In [5]:
df.tail()
Out[5]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price remaining_lease
867886 2022-03 YISHUN 5 ROOM 501B YISHUN ST 51 07 TO 09 112.00 Improved 2018 652000.00 94 years 11 months
867887 2022-03 YISHUN 5 ROOM 820 YISHUN ST 81 07 TO 09 122.00 Improved 1988 585000.00 65 years 06 months
867888 2022-03 YISHUN EXECUTIVE 359 YISHUN RING RD 01 TO 03 145.00 Apartment 1988 682000.00 65 years 05 months
867889 2022-03 YISHUN EXECUTIVE 387 YISHUN RING RD 04 TO 06 146.00 Maisonette 1988 720000.00 65 years 04 months
867890 2022-03 YISHUN EXECUTIVE 277 YISHUN ST 22 10 TO 12 146.00 Maisonette 1985 788000.00 62 years 06 months
In [6]:
assert  len(df) == len(df1) + len(df2) + len(df3) + len(df4) + len(df5), "Warning:  During merge, you lose some data rows..."
assert  len(df) == 867891
In [7]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 867891 entries, 0 to 867890
Data columns (total 11 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   month                867891 non-null  object 
 1   town                 867891 non-null  object 
 2   flat_type            867891 non-null  object 
 3   block                867891 non-null  object 
 4   street_name          867891 non-null  object 
 5   storey_range         867891 non-null  object 
 6   floor_area_sqm       867891 non-null  float64
 7   flat_model           867891 non-null  object 
 8   lease_commence_date  867891 non-null  int64  
 9   resale_price         867891 non-null  float64
 10  remaining_lease      158841 non-null  object 
dtypes: float64(2), int64(1), object(8)
memory usage: 72.8+ MB
In [8]:
df.dtypes
Out[8]:
month                   object
town                    object
flat_type               object
block                   object
street_name             object
storey_range            object
floor_area_sqm         float64
flat_model              object
lease_commence_date      int64
resale_price           float64
remaining_lease         object
dtype: object
In [9]:
df.resale_price.describe()
Out[9]:
count    867891.00
mean     303752.12
std      156280.58
min        5000.00
25%      186000.00
50%      283000.00
75%      395000.00
max     1360000.00
Name: resale_price, dtype: float64

Warning¶

  • These prices go way way back, so you need to 'present day' convert them...
  • Missing data col (remaining_lease) for three of the chunks of data files
  • Has there always been 27 towns ? Wasn't it less at some point ? (careful about newer town/cities)
  • flat_type: 'MULTI GENERATION' and 'MULTI-GENERATION', same thing ?

Unique Values per Feature¶

In [10]:
for c in df.columns: print(c)  # there are all the cols I have to work with initially
month
town
flat_type
block
street_name
storey_range
floor_area_sqm
flat_model
lease_commence_date
resale_price
remaining_lease
In [11]:
df.town.unique().tolist()  # len(df.town.unique()) - >  27 unique towns 
Out[11]:
['ANG MO KIO',
 'BEDOK',
 'BISHAN',
 'BUKIT BATOK',
 'BUKIT MERAH',
 'BUKIT TIMAH',
 'CENTRAL AREA',
 'CHOA CHU KANG',
 'CLEMENTI',
 'GEYLANG',
 'HOUGANG',
 'JURONG EAST',
 'JURONG WEST',
 'KALLANG/WHAMPOA',
 'MARINE PARADE',
 'QUEENSTOWN',
 'SENGKANG',
 'SERANGOON',
 'TAMPINES',
 'TOA PAYOH',
 'WOODLANDS',
 'YISHUN',
 'LIM CHU KANG',
 'SEMBAWANG',
 'BUKIT PANJANG',
 'PASIR RIS',
 'PUNGGOL']
In [12]:
df['town'] = df['town'].str.title() # capitalize the first letter of each word in town 
In [13]:
df.town.unique().tolist()  # len(df.town.unique()) - >  27 unique towns 
Out[13]:
['Ang Mo Kio',
 'Bedok',
 'Bishan',
 'Bukit Batok',
 'Bukit Merah',
 'Bukit Timah',
 'Central Area',
 'Choa Chu Kang',
 'Clementi',
 'Geylang',
 'Hougang',
 'Jurong East',
 'Jurong West',
 'Kallang/Whampoa',
 'Marine Parade',
 'Queenstown',
 'Sengkang',
 'Serangoon',
 'Tampines',
 'Toa Payoh',
 'Woodlands',
 'Yishun',
 'Lim Chu Kang',
 'Sembawang',
 'Bukit Panjang',
 'Pasir Ris',
 'Punggol']

https://en.wikipedia.org/wiki/Ang_Mo_Kio

In [14]:
# read list of towns from wikipedia for sanity 

# interesting:  https://www.hdb.gov.sg/cs/infoweb/about-us/history
    
# As of 2021, there in 23 towns and 3 estates across Singapore. These statistics reflect the boundaries
# of HDB towns and are not necessarily the same as planning area statistics.

towns_singapore_scraped_wikipedia = pd.read_html('https://en.wikipedia.org/wiki/New_towns_of_Singapore')
towns_singapore_scraped_wikipedia[0]
Out[14]:
Name (English/Malay) Chinese Pinyin Tamil Total area (km2) Residential area (km2) Dwelling units Projected ultimate Population
0 Ang Mo Kio 宏茂桥 hóngmàoqiáo ஆங் மோ கியோ 6.38 2.83 50726 58000 138200
1 Bedok 勿洛 wùluò பிடோ 9.37 4.18 62816 79000 187900
2 Bishan 碧山 bìshān பீஷான் 6.90 1.72 20072 34000 61100
3 Bukit Batok 武吉巴督 wǔjíbādū புக்கிட் பாத்தோக் 7.85 2.91 44285 54000 121400
4 Bukit Merah 红山 hóngshān புக்கிட் மேரா 8.58 3.12 54227 68000 141400
5 Bukit Panjang 武吉班让 wǔjíbānràng புக்கிட் பாஞ்சாங் 4.89 2.19 35325 44000 118900
6 Choa Chu Kang 蔡厝港 càicuògǎng சுவா சூ காங் 5.83 3.07 48900 62000 167200
7 Clementi 金文泰 jīnwéntài கிளிமெண்டி 4.12 2.03 26730 39000 69500
8 Geylang 芽笼 yálóng கேலாங் 6.78 2.14 30892 50000 86000
9 Hougang 后港 hòugǎng ஹவ்காங் 13.09 3.67 57272 72000 179700
10 Jurong East 裕廊东 yùlángdōng ஜூரோங் 3.84 1.65 24122 31000 75400
11 Jurong West 裕廊西 yùlángxī ஜூரோங் 9.87 4.80 75208 94000 253800
12 Kallang/Whampoa 加冷/黄浦 jiālĕng/huángpǔ காலாங் 7.99 2.10 39931 57000 105200
13 Pasir Ris 巴西立 bāxīlì பாசிர் ரிஸ் 6.01 3.18 29654 44000 106600
14 Punggol 榜鵝 bǎng'é பொங்கோல் 9.57 3.74 50663 96000 146900
15 Queenstown 女皇镇 nǚhuángzhèn குவீன்ஸ்டவுன் 6.94 2.10 33164 60000 81200
16 Sembawang 三巴旺 sānbāwàng செம்பவாங் 7.08 3.31 30020 65000 81500
17 Sengkang 盛港 shènggǎng செங்காங 10.55 3.97 69196 96000 217700
18 Serangoon 实龙岗 shílónggāng சிராங்கூன் 7.37 1.63 21632 30000 66800
19 Tampines 淡滨尼 dànbīnní தெம்பினிஸ் 12.00 5.49 72683 110000 232700
20 Toa Payoh 大巴窑 dàbāyáo தோ பாயோ 5.56 2.48 39737 61000 103800
21 Woodlands 兀兰 wùlán ஊட்லண்ட்ஸ் 11.98 4.80 69900 102000 243300
22 Yishun 义顺 yìshùn யீஷூன் 7.78 3.98 65158 84000 198500
In [15]:
towns_singapore_scraped_wikipedia[1]  #  estates ? 
Out[15]:
Name (English/Malay) Chinese Pinyin Tamil Dwelling units Population
0 Bukit Timah 武吉知马 wūjízhīmǎ புக்கித் திமா 2423 8100
1 Marine Parade 马林百列 mǎlínbǎiliè மரின் பரேட் 6537 20800
2 Central Area 新加坡中區 xīnjiāpōzhōngqū சிங்கப்பூர் மாவட்டம் 9459 27200

image.png

In [16]:
df.flat_type.unique().tolist() 
# Multi and Multi the same ?
Out[16]:
['1 ROOM',
 '3 ROOM',
 '4 ROOM',
 '5 ROOM',
 '2 ROOM',
 'EXECUTIVE',
 'MULTI GENERATION',
 'MULTI-GENERATION']
In [17]:
df.flat_type.value_counts()
Out[17]:
4 ROOM              326735
3 ROOM              281589
5 ROOM              181442
EXECUTIVE            65888
2 ROOM               10423
1 ROOM                1285
MULTI GENERATION       279
MULTI-GENERATION       250
Name: flat_type, dtype: int64
In [18]:
len(df.block.unique())  # there are many unique blocks, mainly numbers, but do include LETTERS as well
# thus it is object/string
Out[18]:
2578
In [19]:
len(df.street_name.unique().tolist())  # 577 unique street names...
Out[19]:
577
In [20]:
df.storey_range.unique().tolist()
Out[20]:
['10 TO 12',
 '04 TO 06',
 '07 TO 09',
 '01 TO 03',
 '13 TO 15',
 '19 TO 21',
 '16 TO 18',
 '25 TO 27',
 '22 TO 24',
 '28 TO 30',
 '31 TO 33',
 '40 TO 42',
 '37 TO 39',
 '34 TO 36',
 '06 TO 10',
 '01 TO 05',
 '11 TO 15',
 '16 TO 20',
 '21 TO 25',
 '26 TO 30',
 '36 TO 40',
 '31 TO 35',
 '46 TO 48',
 '43 TO 45',
 '49 TO 51']
In [21]:
df.storey_range.value_counts()
Out[21]:
04 TO 06    219125
07 TO 09    197571
01 TO 03    176158
10 TO 12    167760
13 TO 15     56354
16 TO 18     21500
19 TO 21     10247
22 TO 24      6663
25 TO 27      2911
01 TO 05      2700
06 TO 10      2474
28 TO 30      1278
11 TO 15      1259
31 TO 33       427
34 TO 36       392
37 TO 39       380
16 TO 20       265
40 TO 42       192
21 TO 25        92
43 TO 45        44
26 TO 30        39
46 TO 48        38
49 TO 51        13
36 TO 40         7
31 TO 35         2
Name: storey_range, dtype: int64
In [22]:
#  this is tricky.  you can label encode this or whatever, but maybe worth creating
#  a couple new entries in df.  maybe three features:
#  the first value, the last value, and the average value...
#  i.e. 04 - 06 would have new col for 4, 6, and 5... 

df['storey_first'] = df['storey_range'].str.split().str[0]
df['storey_last'] = df['storey_range'].str.split().str[2]
df['storey_first'] = pd.to_numeric(df['storey_first'])
df['storey_last'] = pd.to_numeric(df['storey_last'])
df['storey_avg'] = (df['storey_first'] + df['storey_last'])/2
In [23]:
df.columns
Out[23]:
Index(['month', 'town', 'flat_type', 'block', 'street_name', 'storey_range',
       'floor_area_sqm', 'flat_model', 'lease_commence_date', 'resale_price',
       'remaining_lease', 'storey_first', 'storey_last', 'storey_avg'],
      dtype='object')
In [24]:
df.flat_model.unique().tolist()
# we must assume these have many duplicated...
# i.e. IMPROVED actually equals Improved
Out[24]:
['IMPROVED',
 'NEW GENERATION',
 'MODEL A',
 'STANDARD',
 'SIMPLIFIED',
 'MODEL A-MAISONETTE',
 'APARTMENT',
 'MAISONETTE',
 'TERRACE',
 '2-ROOM',
 'IMPROVED-MAISONETTE',
 'MULTI GENERATION',
 'PREMIUM APARTMENT',
 'Improved',
 'New Generation',
 'Model A',
 'Standard',
 'Apartment',
 'Simplified',
 'Model A-Maisonette',
 'Maisonette',
 'Multi Generation',
 'Adjoined flat',
 'Premium Apartment',
 'Terrace',
 'Improved-Maisonette',
 'Premium Maisonette',
 '2-room',
 'Model A2',
 'DBSS',
 'Type S1',
 'Type S2',
 'Premium Apartment Loft']
In [25]:
df['flat_model'] = df['flat_model'].str.lower()
In [26]:
df.flat_model.unique().tolist()
Out[26]:
['improved',
 'new generation',
 'model a',
 'standard',
 'simplified',
 'model a-maisonette',
 'apartment',
 'maisonette',
 'terrace',
 '2-room',
 'improved-maisonette',
 'multi generation',
 'premium apartment',
 'adjoined flat',
 'premium maisonette',
 'model a2',
 'dbss',
 'type s1',
 'type s2',
 'premium apartment loft']
In [27]:
df.flat_model.value_counts()
Out[27]:
model a                   242173
improved                  227656
new generation            182208
simplified                 55353
standard                   41047
premium apartment          40355
apartment                  33607
maisonette                 28192
model a2                    9490
dbss                        2596
model a-maisonette          1974
adjoined flat               1155
terrace                      670
multi generation             529
type s1                      360
type s2                      186
improved-maisonette          118
premium maisonette            85
premium apartment loft        84
2-room                        53
Name: flat_model, dtype: int64

https://www.teoalida.com/singapore/HDB-flat-size-comparison-1980s-vs-present.png

image.png

In [28]:
# wtf 

https://www.teoalida.com/singapore/hdbflattypes/

https://www.teoalida.com/singapore/hdbmap/

image.png


In [29]:
df.columns
Out[29]:
Index(['month', 'town', 'flat_type', 'block', 'street_name', 'storey_range',
       'floor_area_sqm', 'flat_model', 'lease_commence_date', 'resale_price',
       'remaining_lease', 'storey_first', 'storey_last', 'storey_avg'],
      dtype='object')
In [30]:
df.floor_area_sqm.describe()
Out[30]:
count   867891.00
mean        95.68
std         25.96
min         28.00
25%         73.00
50%         93.00
75%        113.00
max        307.00
Name: floor_area_sqm, dtype: float64
In [31]:
# create new variable of sqft for us unsophisticated Americans
df['floor_area_sqft']  = df['floor_area_sqm'] * 10.7639
In [32]:
df.floor_area_sqft.describe()
Out[32]:
count   867891.00
mean      1029.93
std        279.42
min        301.39
25%        785.76
50%       1001.04
75%       1216.32
max       3304.52
Name: floor_area_sqft, dtype: float64
In [33]:
df.columns
Out[33]:
Index(['month', 'town', 'flat_type', 'block', 'street_name', 'storey_range',
       'floor_area_sqm', 'flat_model', 'lease_commence_date', 'resale_price',
       'remaining_lease', 'storey_first', 'storey_last', 'storey_avg',
       'floor_area_sqft'],
      dtype='object')
In [34]:
df.resale_price.describe()
Out[34]:
count    867891.00
mean     303752.12
std      156280.58
min        5000.00
25%      186000.00
50%      283000.00
75%      395000.00
max     1360000.00
Name: resale_price, dtype: float64
In [35]:
df.flat_type.unique()  # maybe convert the E and the Multi-G to like 6 ROOM and 7 ROOM ?  
# figure out way to quantify and rank, you can just label encode this. 
# make sure they are in numeric significant 'order' ranked, etc crap
Out[35]:
array(['1 ROOM', '3 ROOM', '4 ROOM', '5 ROOM', '2 ROOM', 'EXECUTIVE',
       'MULTI GENERATION', 'MULTI-GENERATION'], dtype=object)
In [36]:
df.lease_commence_date.unique()
Out[36]:
array([1977, 1976, 1978, 1979, 1984, 1980, 1985, 1981, 1982, 1986, 1972,
       1983, 1973, 1969, 1975, 1971, 1974, 1967, 1970, 1968, 1988, 1987,
       1989, 1990, 1992, 1993, 1994, 1991, 1995, 1996, 1997, 1998, 1999,
       2000, 2001, 1966, 2002, 2006, 2003, 2005, 2004, 2008, 2007, 2009,
       2010, 2012, 2011, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
      dtype=int64)
In [37]:
df.lease_commence_date.value_counts()
Out[37]:
1985    84260
1984    61034
1988    48699
1987    41179
1978    39833
1986    37131
1989    30943
1980    30918
1979    29820
1997    29368
1993    26992
1996    26782
1998    24187
1983    22235
1976    20674
1981    20110
1977    19976
1999    19399
1992    19338
1995    18137
2001    17845
2000    17007
1975    16704
2003    14287
1974    14022
1982    12800
2002    11468
1970    10981
1990     8915
1973     8302
1969     8091
2004     7706
1971     7626
2015     7623
1994     6229
1967     5939
1972     5586
2016     4445
2013     4322
2012     4013
2005     3617
2014     2829
2006     2815
2017     2650
2011     2238
2008     1862
1968     1822
2009     1559
1991     1192
2010     1159
2007      836
2018      322
2019       34
1966       30
Name: lease_commence_date, dtype: int64
In [38]:
# df.remaining_lease.unique()
#  snippet 
# array([nan, 70.0, 65.0, 64.0, 63.0, 62.0, 69.0, 60.0, 61.0, 86.0, 77.0,
#        80.0, 90.0, 87.0, 66.0, 58.0, 94.0, 71.0, 68.0, 84.0, 73.0, 79.0,
#        76.0, 72.0, 82.0, 74.0, 67.0, 88.0, 81.0, 89.0, 53.0, 54.0, 55.0,
#        57.0, 93.0, 83.0, 85.0, 92.0, 91.0, 59.0, 95.0, 52.0, 51.0, 56.0,
#        75.0, 96.0, 78.0, 50.0, 97.0, 49.0, 48.0, '61 years 04 months',
#        '60 years 07 months', '62 years 05 months', '62 years 01 month',
#        '63 years', '61 years 06 months', '58 years 04 months',
#        '59 years 08 months', '59 years 06 months', '60 years',
#        '62 years 08 months', '61 years', '60 years 10 months',
#        '59 years 03 months', '61 years 05 months', '60 years 04 months',
#        '62 years', '60 years 03 months', '63 years 09 months',
#        '61 years 01 month', '61 years 10 months', '58 years 06 months',
#        '59 years 04 months', '62 years 11 months', '60 years 08 months',
#        '93 years 08 months', '93 years 07 months', '60 years 01 month',
In [39]:
df.remaining_lease.value_counts
Out[39]:
<bound method IndexOpsMixin.value_counts of 0                        NaN
1                        NaN
2                        NaN
3                        NaN
4                        NaN
                 ...        
867886    94 years 11 months
867887    65 years 06 months
867888    65 years 05 months
867889    65 years 04 months
867890    62 years 06 months
Name: remaining_lease, Length: 867891, dtype: object>
In [40]:
# one approach:  
# we really most likely care about the number of years..
# break out the values like: 94 years 11 months and push
# to new col years and new col months (or call it a decimal of years)
In [41]:
# 94 years 11 months
In [42]:
df[df['remaining_lease'].str.contains("years", na=False)].head()
Out[42]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price remaining_lease storey_first storey_last storey_avg floor_area_sqft
746203 2017-01 Ang Mo Kio 2 ROOM 406 ANG MO KIO AVE 10 10 TO 12 44.00 improved 1979 232000.00 61 years 04 months 10 12 11.00 473.61
746204 2017-01 Ang Mo Kio 3 ROOM 108 ANG MO KIO AVE 4 01 TO 03 67.00 new generation 1978 250000.00 60 years 07 months 1 3 2.00 721.18
746205 2017-01 Ang Mo Kio 3 ROOM 602 ANG MO KIO AVE 5 01 TO 03 67.00 new generation 1980 262000.00 62 years 05 months 1 3 2.00 721.18
746206 2017-01 Ang Mo Kio 3 ROOM 465 ANG MO KIO AVE 10 04 TO 06 68.00 new generation 1980 265000.00 62 years 01 month 4 6 5.00 731.95
746207 2017-01 Ang Mo Kio 3 ROOM 601 ANG MO KIO AVE 5 01 TO 03 67.00 new generation 1980 265000.00 62 years 05 months 1 3 2.00 721.18
In [43]:
# df[df['remaining_lease'].str.contains("years", na=False)]['remaining_lease'] 
# not sure what makes sense to do here...

Pandas Profiling¶

In [44]:
profile = ProfileReport(df, title="Pandas Profiling Report")
In [45]:
profile.to_notebook_iframe()
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
In [ ]:
# 11004
In [47]:
labelencoder = LabelEncoder()

df['flat_type_label'] = labelencoder.fit_transform(df['flat_type'])

df
Out[47]:
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price remaining_lease storey_first storey_last storey_avg floor_area_sqft flat_type_label
0 1990-01 Ang Mo Kio 1 ROOM 309 ANG MO KIO AVE 1 10 TO 12 31.00 improved 1977 9000.00 NaN 10 12 11.00 333.68 0
1 1990-01 Ang Mo Kio 1 ROOM 309 ANG MO KIO AVE 1 04 TO 06 31.00 improved 1977 6000.00 NaN 4 6 5.00 333.68 0
2 1990-01 Ang Mo Kio 1 ROOM 309 ANG MO KIO AVE 1 10 TO 12 31.00 improved 1977 8000.00 NaN 10 12 11.00 333.68 0
3 1990-01 Ang Mo Kio 1 ROOM 309 ANG MO KIO AVE 1 07 TO 09 31.00 improved 1977 6000.00 NaN 7 9 8.00 333.68 0
4 1990-01 Ang Mo Kio 3 ROOM 216 ANG MO KIO AVE 1 04 TO 06 73.00 new generation 1976 47200.00 NaN 4 6 5.00 785.76 2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
867886 2022-03 Yishun 5 ROOM 501B YISHUN ST 51 07 TO 09 112.00 improved 2018 652000.00 94 years 11 months 7 9 8.00 1205.56 4
867887 2022-03 Yishun 5 ROOM 820 YISHUN ST 81 07 TO 09 122.00 improved 1988 585000.00 65 years 06 months 7 9 8.00 1313.20 4
867888 2022-03 Yishun EXECUTIVE 359 YISHUN RING RD 01 TO 03 145.00 apartment 1988 682000.00 65 years 05 months 1 3 2.00 1560.77 5
867889 2022-03 Yishun EXECUTIVE 387 YISHUN RING RD 04 TO 06 146.00 maisonette 1988 720000.00 65 years 04 months 4 6 5.00 1571.53 5
867890 2022-03 Yishun EXECUTIVE 277 YISHUN ST 22 10 TO 12 146.00 maisonette 1985 788000.00 62 years 06 months 10 12 11.00 1571.53 5

867891 rows × 16 columns

In [48]:
df.drop('flat_type', axis=1, inplace=True)
In [49]:
df.street_name.unique()  # large large number of them 
Out[49]:
array(['ANG MO KIO AVE 1', 'ANG MO KIO AVE 3', 'ANG MO KIO AVE 4',
       'ANG MO KIO AVE 10', 'ANG MO KIO AVE 5', 'ANG MO KIO AVE 8',
       'ANG MO KIO AVE 6', 'ANG MO KIO AVE 9', 'ANG MO KIO AVE 2',
       'BEDOK RESERVOIR RD', 'BEDOK NTH ST 3', 'BEDOK STH RD',
       'NEW UPP CHANGI RD', 'BEDOK NTH RD', 'BEDOK STH AVE 1',
       'CHAI CHEE RD', 'CHAI CHEE DR', 'BEDOK NTH AVE 4',
       'BEDOK STH AVE 3', 'BEDOK STH AVE 2', 'BEDOK NTH ST 2',
       'BEDOK NTH ST 4', 'BEDOK NTH AVE 2', 'BEDOK NTH AVE 3',
       'BEDOK NTH AVE 1', 'BEDOK NTH ST 1', 'CHAI CHEE ST', 'SIN MING RD',
       'SHUNFU RD', 'BT BATOK ST 11', 'BT BATOK WEST AVE 8',
       'BT BATOK WEST AVE 6', 'BT BATOK ST 21', 'BT BATOK EAST AVE 5',
       'BT BATOK EAST AVE 4', 'HILLVIEW AVE', 'BT BATOK CTRL',
       'BT BATOK ST 31', 'BT BATOK EAST AVE 3', 'TAMAN HO SWEE',
       'TELOK BLANGAH CRES', 'BEO CRES', 'TELOK BLANGAH DR', 'DEPOT RD',
       'TELOK BLANGAH RISE', 'JLN BT MERAH', 'HENDERSON RD', 'INDUS RD',
       'BT MERAH VIEW', 'HENDERSON CRES', 'BT PURMEI RD',
       'TELOK BLANGAH HTS', 'EVERTON PK', 'KG BAHRU HILL', 'REDHILL CL',
       'HOY FATT RD', 'HAVELOCK RD', 'JLN KLINIK', 'JLN RUMAH TINGGI',
       'JLN BT HO SWEE', 'KIM CHENG ST', 'MOH GUAN TER',
       'TELOK BLANGAH WAY', 'KIM TIAN RD', 'KIM TIAN PL', 'EMPRESS RD',
       "QUEEN'S RD", 'FARRER RD', 'JLN KUKOH', 'OUTRAM PK', 'SHORT ST',
       'SELEGIE RD', 'UPP CROSS ST', 'WATERLOO ST', 'QUEEN ST',
       'BUFFALO RD', 'ROWELL RD', 'ROCHOR RD', 'BAIN ST', 'SMITH ST',
       'VEERASAMY RD', 'TECK WHYE AVE', 'TECK WHYE LANE',
       'CLEMENTI AVE 3', 'WEST COAST DR', 'CLEMENTI AVE 2',
       'CLEMENTI AVE 5', 'CLEMENTI AVE 4', 'CLEMENTI AVE 1',
       'WEST COAST RD', 'CLEMENTI WEST ST 1', 'CLEMENTI WEST ST 2',
       'CLEMENTI ST 13', "C'WEALTH AVE WEST", 'CLEMENTI AVE 6',
       'CLEMENTI ST 14', 'CIRCUIT RD', 'MACPHERSON LANE',
       'JLN PASAR BARU', 'GEYLANG SERAI', 'EUNOS CRES', 'SIMS DR',
       'ALJUNIED CRES', 'GEYLANG EAST AVE 1', 'DAKOTA CRES', 'PINE CL',
       'HAIG RD', 'BALAM RD', 'JLN DUA', 'GEYLANG EAST CTRL',
       'EUNOS RD 5', 'HOUGANG AVE 3', 'HOUGANG AVE 5', 'HOUGANG AVE 1',
       'HOUGANG ST 22', 'HOUGANG AVE 10', 'LOR AH SOO', 'HOUGANG ST 11',
       'HOUGANG AVE 7', 'HOUGANG ST 21', 'TEBAN GDNS RD',
       'JURONG EAST AVE 1', 'JURONG EAST ST 32', 'JURONG EAST ST 13',
       'JURONG EAST ST 21', 'JURONG EAST ST 24', 'JURONG EAST ST 31',
       'PANDAN GDNS', 'YUNG KUANG RD', 'HO CHING RD', 'HU CHING RD',
       'BOON LAY DR', 'BOON LAY AVE', 'BOON LAY PL', 'JURONG WEST ST 52',
       'JURONG WEST ST 41', 'JURONG WEST AVE 1', 'JURONG WEST ST 42',
       'JLN BATU', "ST. GEORGE'S RD", 'NTH BRIDGE RD', 'FRENCH RD',
       'BEACH RD', 'WHAMPOA DR', 'UPP BOON KENG RD', 'BENDEMEER RD',
       'WHAMPOA WEST', 'LOR LIMAU', 'KALLANG BAHRU', 'GEYLANG BAHRU',
       'DORSET RD', 'OWEN RD', 'KG ARANG RD', 'JLN BAHAGIA',
       'MOULMEIN RD', 'TOWNER RD', 'JLN RAJAH', 'KENT RD', 'AH HOOD RD',
       "KING GEORGE'S AVE", 'CRAWFORD LANE', 'MARINE CRES', 'MARINE DR',
       'MARINE TER', "C'WEALTH CL", "C'WEALTH DR", 'TANGLIN HALT RD',
       "C'WEALTH CRES", 'DOVER RD', 'MARGARET DR', 'GHIM MOH RD',
       'DOVER CRES', 'STIRLING RD', 'MEI LING ST', 'HOLLAND CL',
       'HOLLAND AVE', 'HOLLAND DR', 'DOVER CL EAST',
       'SELETAR WEST FARMWAY 6', 'LOR LEW LIAN', 'SERANGOON NTH AVE 1',
       'SERANGOON AVE 2', 'SERANGOON AVE 4', 'SERANGOON CTRL',
       'TAMPINES ST 11', 'TAMPINES ST 21', 'TAMPINES ST 91',
       'TAMPINES ST 81', 'TAMPINES AVE 4', 'TAMPINES ST 22',
       'TAMPINES ST 12', 'TAMPINES ST 23', 'TAMPINES ST 24',
       'TAMPINES ST 41', 'TAMPINES ST 82', 'TAMPINES ST 83',
       'TAMPINES AVE 5', 'LOR 2 TOA PAYOH', 'LOR 8 TOA PAYOH',
       'LOR 1 TOA PAYOH', 'LOR 5 TOA PAYOH', 'LOR 3 TOA PAYOH',
       'LOR 7 TOA PAYOH', 'TOA PAYOH EAST', 'LOR 4 TOA PAYOH',
       'TOA PAYOH CTRL', 'TOA PAYOH NTH', 'POTONG PASIR AVE 3',
       'POTONG PASIR AVE 1', 'UPP ALJUNIED LANE', 'JOO SENG RD',
       'MARSILING LANE', 'MARSILING DR', 'MARSILING RISE',
       'MARSILING CRES', 'WOODLANDS CTR RD', 'WOODLANDS ST 13',
       'WOODLANDS ST 11', 'YISHUN RING RD', 'YISHUN AVE 5',
       'YISHUN ST 72', 'YISHUN ST 11', 'YISHUN ST 21', 'YISHUN ST 22',
       'YISHUN AVE 3', 'CHAI CHEE AVE', 'ZION RD', 'LENGKOK BAHRU',
       'SPOTTISWOODE PK RD', 'NEW MKT RD', 'TG PAGAR PLAZA',
       'KELANTAN RD', 'PAYA LEBAR WAY', 'UBI AVE 1', 'SIMS AVE',
       'YUNG PING RD', 'TAO CHING RD', 'GLOUCESTER RD', 'BOON KENG RD',
       'WHAMPOA STH', 'CAMBRIDGE RD', 'TAMPINES ST 42', 'LOR 6 TOA PAYOH',
       'KIM KEAT AVE', 'YISHUN AVE 6', 'YISHUN AVE 9', 'YISHUN ST 71',
       'BT BATOK ST 32', 'SILAT AVE', 'TIONG BAHRU RD', 'SAGO LANE',
       "ST. GEORGE'S LANE", 'LIM CHU KANG RD', "C'WEALTH AVE",
       "QUEEN'S CL", 'SERANGOON AVE 3', 'POTONG PASIR AVE 2',
       'WOODLANDS AVE 1', 'YISHUN AVE 4', 'LOWER DELTA RD', 'NILE RD',
       'JLN MEMBINA BARAT', 'JLN BERSEH', 'CHANDER RD', 'CASSIA CRES',
       'OLD AIRPORT RD', 'ALJUNIED RD', 'BUANGKOK STH FARMWAY 1',
       'BT BATOK ST 33', 'ALEXANDRA RD', 'CHIN SWEE RD', 'SIMS PL',
       'HOUGANG AVE 2', 'HOUGANG AVE 8', 'SEMBAWANG RD', 'SIMEI ST 1',
       'BT BATOK ST 34', 'BT MERAH CTRL', 'LIM LIAK ST', 'JLN TENTERAM',
       'WOODLANDS ST 32', 'SIN MING AVE', 'BT BATOK ST 52', 'DELTA AVE',
       'PIPIT RD', 'HOUGANG AVE 4', 'QUEENSWAY', 'YISHUN ST 61',
       'BISHAN ST 12', "JLN MA'MOR", 'TAMPINES ST 44', 'TAMPINES ST 43',
       'BISHAN ST 13', 'JLN DUSUN', 'YISHUN AVE 2', 'JOO CHIAT RD',
       'EAST COAST RD', 'REDHILL RD', 'KIM PONG RD', 'RACE COURSE RD',
       'KRETA AYER RD', 'HOUGANG ST 61', 'TESSENSOHN RD', 'MARSILING RD',
       'YISHUN ST 81', 'BT BATOK ST 51', 'BT BATOK WEST AVE 4',
       'BT BATOK WEST AVE 2', 'JURONG WEST ST 91', 'JURONG WEST ST 81',
       'GANGSA RD', 'MCNAIR RD', 'SIMEI ST 4', 'YISHUN AVE 7',
       'SERANGOON NTH AVE 2', 'YISHUN AVE 11', 'BANGKIT RD',
       'JURONG WEST ST 73', 'OUTRAM HILL', 'HOUGANG AVE 6',
       'PASIR RIS ST 12', 'PENDING RD', 'PETIR RD', 'LOR 3 GEYLANG',
       'BISHAN ST 11', 'PASIR RIS DR 6', 'BISHAN ST 23',
       'JURONG WEST ST 92', 'PASIR RIS ST 11', 'YISHUN CTRL',
       'BISHAN ST 22', 'SIMEI RD', 'TAMPINES ST 84', 'BT PANJANG RING RD',
       'JURONG WEST ST 93', 'FAJAR RD', 'WOODLANDS ST 81',
       'CHOA CHU KANG CTRL', 'PASIR RIS ST 51', 'HOUGANG ST 52',
       'CASHEW RD', 'TOH YI DR', 'HOUGANG CTRL', 'KG KAYU RD',
       'TAMPINES AVE 8', 'TAMPINES ST 45', 'SIMEI ST 2',
       'WOODLANDS AVE 3', 'LENGKONG TIGA', 'WOODLANDS ST 82',
       'SERANGOON NTH AVE 4', 'SERANGOON CTRL DR', 'BRIGHT HILL DR',
       'SAUJANA RD', 'CHOA CHU KANG AVE 3', 'TAMPINES AVE 9',
       'JURONG WEST ST 51', 'YUNG HO RD', 'SERANGOON AVE 1',
       'PASIR RIS ST 41', 'GEYLANG EAST AVE 2', 'CHOA CHU KANG AVE 2',
       'KIM KEAT LINK', 'PASIR RIS DR 4', 'PASIR RIS ST 21',
       'SENG POH RD', 'HOUGANG ST 51', 'JURONG WEST ST 72',
       'JURONG WEST ST 71', 'PASIR RIS ST 52', 'TAMPINES ST 32',
       'CHOA CHU KANG AVE 4', 'CHOA CHU KANG LOOP', 'JLN TENAGA',
       'TAMPINES CTRL 1', 'TAMPINES ST 33', 'BT BATOK WEST AVE 7',
       'JURONG WEST AVE 5', 'TAMPINES AVE 7', 'WOODLANDS ST 83',
       'CHOA CHU KANG ST 51', 'PASIR RIS DR 3', 'YISHUN CTRL 1',
       'CHOA CHU KANG AVE 1', 'WOODLANDS ST 31', 'BT MERAH LANE 1',
       'PASIR RIS ST 13', 'ELIAS RD', 'BISHAN ST 24', 'WHAMPOA RD',
       'WOODLANDS ST 41', 'PASIR RIS ST 71', 'JURONG WEST ST 74',
       'PASIR RIS DR 1', 'PASIR RIS ST 72', 'PASIR RIS DR 10',
       'CHOA CHU KANG ST 52', 'CLARENCE LANE', 'CHOA CHU KANG NTH 6',
       'PASIR RIS ST 53', 'CHOA CHU KANG NTH 5', 'ANG MO KIO ST 21',
       'JLN DAMAI', 'CHOA CHU KANG ST 62', 'WOODLANDS AVE 5',
       'WOODLANDS DR 50', 'CHOA CHU KANG ST 53', 'TAMPINES ST 72',
       'UPP SERANGOON RD', 'JURONG WEST ST 75', 'STRATHMORE AVE',
       'ANG MO KIO ST 31', 'TAMPINES ST 34', 'YUNG AN RD',
       'WOODLANDS AVE 4', 'CHOA CHU KANG NTH 7', 'ANG MO KIO ST 11',
       'WOODLANDS AVE 9', 'YUNG LOH RD', 'CHOA CHU KANG DR',
       'CHOA CHU KANG ST 54', 'REDHILL LANE', 'KANG CHING RD',
       'TAH CHING RD', 'SIMEI ST 5', 'WOODLANDS DR 40', 'WOODLANDS DR 70',
       'TAMPINES ST 71', 'WOODLANDS DR 42', 'SERANGOON NTH AVE 3',
       'JELAPANG RD', 'BT BATOK ST 22', 'HOUGANG ST 91',
       'WOODLANDS AVE 6', 'WOODLANDS CIRCLE', 'CORPORATION DR',
       'LOMPANG RD', 'WOODLANDS DR 72', 'CHOA CHU KANG ST 64',
       'BT BATOK ST 24', 'JLN TECK WHYE', 'WOODLANDS CRES',
       'WOODLANDS DR 60', 'CHANGI VILLAGE RD', 'BT BATOK ST 25',
       'HOUGANG AVE 9', 'JURONG WEST CTRL 1', 'WOODLANDS RING RD',
       'CHOA CHU KANG AVE 5', 'TOH GUAN RD', 'JURONG WEST ST 61',
       'WOODLANDS DR 14', 'HOUGANG ST 92', 'CHOA CHU KANG CRES',
       'SEMBAWANG CL', 'CANBERRA RD', 'SEMBAWANG CRES', 'SEMBAWANG VISTA',
       'COMPASSVALE WALK', 'RIVERVALE ST', 'WOODLANDS DR 62',
       'SEMBAWANG DR', 'WOODLANDS DR 53', 'WOODLANDS DR 52',
       'RIVERVALE WALK', 'COMPASSVALE LANE', 'RIVERVALE DR', 'SENJA RD',
       'JURONG WEST ST 65', 'RIVERVALE CRES', 'WOODLANDS DR 44',
       'COMPASSVALE DR', 'WOODLANDS DR 16', 'COMPASSVALE RD',
       'WOODLANDS DR 73', 'HOUGANG ST 31', 'JURONG WEST ST 64',
       'WOODLANDS DR 71', 'YISHUN ST 20', 'ADMIRALTY DR',
       'COMPASSVALE ST', 'BEDOK RESERVOIR VIEW', 'YUNG SHENG RD',
       'ADMIRALTY LINK', 'SENGKANG EAST WAY', 'ANG MO KIO ST 32',
       'ANG MO KIO ST 52', 'BOON TIONG RD', 'JURONG WEST ST 62',
       'ANCHORVALE LINK', 'CANBERRA LINK', 'COMPASSVALE CRES',
       'CLEMENTI ST 12', 'MONTREAL DR', 'WELLINGTON CIRCLE',
       'SENGKANG EAST RD', 'JURONG WEST AVE 3', 'ANCHORVALE LANE',
       'SENJA LINK', 'EDGEFIELD PLAINS', 'ANCHORVALE DR', 'SEGAR RD',
       'FARRER PK RD', 'PUNGGOL FIELD', 'EDGEDALE PLAINS',
       'ANCHORVALE RD', 'CANTONMENT CL', 'JLN MEMBINA', 'FERNVALE LANE',
       'JURONG WEST ST 25', 'CLEMENTI ST 11', 'PUNGGOL FIELD WALK',
       'KLANG LANE', 'PUNGGOL CTRL', 'JELEBU RD', 'BUANGKOK CRES',
       'WOODLANDS DR 75', 'BT BATOK WEST AVE 5', 'JELLICOE RD',
       'PUNGGOL DR', 'JURONG WEST ST 24', 'SEMBAWANG WAY', 'FERNVALE RD',
       'BUANGKOK LINK', 'FERNVALE LINK', 'JLN TIGA', 'YUAN CHING RD',
       'COMPASSVALE LINK', 'MARINE PARADE CTRL', 'COMPASSVALE BOW',
       'PUNGGOL RD', 'BEDOK CTRL', 'PUNGGOL EAST', 'SENGKANG CTRL',
       'TAMPINES CTRL 7', 'SENGKANG WEST AVE', 'PUNGGOL PL',
       'CANTONMENT RD', 'GHIM MOH LINK', 'SIMEI LANE', 'YISHUN ST 41',
       'TELOK BLANGAH ST 31', 'JLN KAYU', 'LOR 1A TOA PAYOH',
       'PUNGGOL WALK', 'SENGKANG WEST WAY', 'BUANGKOK GREEN',
       'PUNGGOL WAY', 'YISHUN ST 31', 'TECK WHYE CRES', 'MONTREAL LINK',
       'UPP SERANGOON CRES', 'SUMANG LINK', 'SENGKANG EAST AVE',
       'YISHUN AVE 1', 'ANCHORVALE CRES', 'ANCHORVALE ST',
       'TAMPINES CTRL 8', 'YISHUN ST 51', 'UPP SERANGOON VIEW',
       'TAMPINES AVE 1', 'BEDOK RESERVOIR CRES', 'ANG MO KIO ST 61',
       'DAWSON RD', 'FERNVALE ST', 'HOUGANG ST 32', 'TAMPINES ST 86',
       'SUMANG WALK', 'CHOA CHU KANG AVE 7', 'KEAT HONG CL',
       'JURONG WEST CTRL 3', 'KEAT HONG LINK', 'ALJUNIED AVE 2',
       'CANBERRA CRES', 'SUMANG LANE', 'CANBERRA ST', 'ANG MO KIO ST 44'],
      dtype=object)
In [50]:
df.flat_model.unique()
Out[50]:
array(['improved', 'new generation', 'model a', 'standard', 'simplified',
       'model a-maisonette', 'apartment', 'maisonette', 'terrace',
       '2-room', 'improved-maisonette', 'multi generation',
       'premium apartment', 'adjoined flat', 'premium maisonette',
       'model a2', 'dbss', 'type s1', 'type s2', 'premium apartment loft'],
      dtype=object)
In [51]:
labelencoder = LabelEncoder()
df['flat_model_label'] = labelencoder.fit_transform(df['flat_model'])
df
Out[51]:
month town block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price remaining_lease storey_first storey_last storey_avg floor_area_sqft flat_type_label flat_model_label
0 1990-01 Ang Mo Kio 309 ANG MO KIO AVE 1 10 TO 12 31.00 improved 1977 9000.00 NaN 10 12 11.00 333.68 0 4
1 1990-01 Ang Mo Kio 309 ANG MO KIO AVE 1 04 TO 06 31.00 improved 1977 6000.00 NaN 4 6 5.00 333.68 0 4
2 1990-01 Ang Mo Kio 309 ANG MO KIO AVE 1 10 TO 12 31.00 improved 1977 8000.00 NaN 10 12 11.00 333.68 0 4
3 1990-01 Ang Mo Kio 309 ANG MO KIO AVE 1 07 TO 09 31.00 improved 1977 6000.00 NaN 7 9 8.00 333.68 0 4
4 1990-01 Ang Mo Kio 216 ANG MO KIO AVE 1 04 TO 06 73.00 new generation 1976 47200.00 NaN 4 6 5.00 785.76 2 11
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
867886 2022-03 Yishun 501B YISHUN ST 51 07 TO 09 112.00 improved 2018 652000.00 94 years 11 months 7 9 8.00 1205.56 4 4
867887 2022-03 Yishun 820 YISHUN ST 81 07 TO 09 122.00 improved 1988 585000.00 65 years 06 months 7 9 8.00 1313.20 4 4
867888 2022-03 Yishun 359 YISHUN RING RD 01 TO 03 145.00 apartment 1988 682000.00 65 years 05 months 1 3 2.00 1560.77 5 2
867889 2022-03 Yishun 387 YISHUN RING RD 04 TO 06 146.00 maisonette 1988 720000.00 65 years 04 months 4 6 5.00 1571.53 5 6
867890 2022-03 Yishun 277 YISHUN ST 22 10 TO 12 146.00 maisonette 1985 788000.00 62 years 06 months 10 12 11.00 1571.53 5 6

867891 rows × 16 columns

In [52]:
df.drop('flat_model', axis=1, inplace=True)
In [53]:
df.dtypes
Out[53]:
month                   object
town                    object
block                   object
street_name             object
storey_range            object
floor_area_sqm         float64
lease_commence_date      int64
resale_price           float64
remaining_lease         object
storey_first             int64
storey_last              int64
storey_avg             float64
floor_area_sqft        float64
flat_type_label          int32
flat_model_label         int32
dtype: object
In [54]:
labelencoder = LabelEncoder()
df['town'] = labelencoder.fit_transform(df['town'])
df.town.unique()
Out[54]:
array([ 0,  1,  2,  3,  4,  6,  7,  8,  9, 10, 11, 12, 13, 14, 16, 19, 21,
       22, 23, 24, 25, 26, 15, 20,  5, 17, 18])
In [55]:
df.drop('town', axis=1, inplace=True)
In [56]:
df.dtypes
Out[56]:
month                   object
block                   object
street_name             object
storey_range            object
floor_area_sqm         float64
lease_commence_date      int64
resale_price           float64
remaining_lease         object
storey_first             int64
storey_last              int64
storey_avg             float64
floor_area_sqft        float64
flat_type_label          int32
flat_model_label         int32
dtype: object
In [57]:
df
Out[57]:
month block street_name storey_range floor_area_sqm lease_commence_date resale_price remaining_lease storey_first storey_last storey_avg floor_area_sqft flat_type_label flat_model_label
0 1990-01 309 ANG MO KIO AVE 1 10 TO 12 31.00 1977 9000.00 NaN 10 12 11.00 333.68 0 4
1 1990-01 309 ANG MO KIO AVE 1 04 TO 06 31.00 1977 6000.00 NaN 4 6 5.00 333.68 0 4
2 1990-01 309 ANG MO KIO AVE 1 10 TO 12 31.00 1977 8000.00 NaN 10 12 11.00 333.68 0 4
3 1990-01 309 ANG MO KIO AVE 1 07 TO 09 31.00 1977 6000.00 NaN 7 9 8.00 333.68 0 4
4 1990-01 216 ANG MO KIO AVE 1 04 TO 06 73.00 1976 47200.00 NaN 4 6 5.00 785.76 2 11
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
867886 2022-03 501B YISHUN ST 51 07 TO 09 112.00 2018 652000.00 94 years 11 months 7 9 8.00 1205.56 4 4
867887 2022-03 820 YISHUN ST 81 07 TO 09 122.00 1988 585000.00 65 years 06 months 7 9 8.00 1313.20 4 4
867888 2022-03 359 YISHUN RING RD 01 TO 03 145.00 1988 682000.00 65 years 05 months 1 3 2.00 1560.77 5 2
867889 2022-03 387 YISHUN RING RD 04 TO 06 146.00 1988 720000.00 65 years 04 months 4 6 5.00 1571.53 5 6
867890 2022-03 277 YISHUN ST 22 10 TO 12 146.00 1985 788000.00 62 years 06 months 10 12 11.00 1571.53 5 6

867891 rows × 14 columns

In [58]:
df.lease_commence_date.unique()   # this is a YEAR, but should you label encode it, OR keep it as an year integer ??? 
Out[58]:
array([1977, 1976, 1978, 1979, 1984, 1980, 1985, 1981, 1982, 1986, 1972,
       1983, 1973, 1969, 1975, 1971, 1974, 1967, 1970, 1968, 1988, 1987,
       1989, 1990, 1992, 1993, 1994, 1991, 1995, 1996, 1997, 1998, 1999,
       2000, 2001, 1966, 2002, 2006, 2003, 2005, 2004, 2008, 2007, 2009,
       2010, 2012, 2011, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
      dtype=int64)
In [59]:
df.dtypes 
Out[59]:
month                   object
block                   object
street_name             object
storey_range            object
floor_area_sqm         float64
lease_commence_date      int64
resale_price           float64
remaining_lease         object
storey_first             int64
storey_last              int64
storey_avg             float64
floor_area_sqft        float64
flat_type_label          int32
flat_model_label         int32
dtype: object
In [ ]:
# fix or remove ? 

#  df['flat_type_labels'] = labelencoder.fit_transform(df['flat_type'])
In [61]:
len(df)
Out[61]:
867891

Encode¶

In [62]:
df_backup = df.copy()

Correlation Plot¶

In [63]:
c = df.corr()
c
Out[63]:
floor_area_sqm lease_commence_date resale_price storey_first storey_last storey_avg floor_area_sqft flat_type_label flat_model_label
floor_area_sqm 1.00 0.43 0.60 0.05 0.05 0.05 1.00 0.95 -0.41
lease_commence_date 0.43 1.00 0.52 0.15 0.15 0.15 0.43 0.45 -0.21
resale_price 0.60 0.52 1.00 0.23 0.23 0.23 0.60 0.63 -0.27
storey_first 0.05 0.15 0.23 1.00 1.00 1.00 0.05 0.08 -0.08
storey_last 0.05 0.15 0.23 1.00 1.00 1.00 0.05 0.08 -0.08
storey_avg 0.05 0.15 0.23 1.00 1.00 1.00 0.05 0.08 -0.08
floor_area_sqft 1.00 0.43 0.60 0.05 0.05 0.05 1.00 0.95 -0.41
flat_type_label 0.95 0.45 0.63 0.08 0.08 0.08 0.95 1.00 -0.37
flat_model_label -0.41 -0.21 -0.27 -0.08 -0.08 -0.08 -0.41 -0.37 1.00
In [64]:
sns.heatmap(c, annot=True)
plt.show();
In [65]:
# cleaner view 
plt.figure(figsize=(16, 6))
mask = np.triu(np.ones_like(df.corr(), dtype=bool))
heatmap = sns.heatmap(df.corr(), mask=mask, vmin=-1, vmax=1, annot=True, cmap='BrBG')
heatmap.set_title('Triangle Correlation Heatmap', fontdict={'fontsize':18}, pad=16);

Standard EDA¶

In [66]:
sns.set(style='whitegrid')
plt.figure(figsize=(15,10))
sns.scatterplot(x="floor_area_sqm", y="resale_price", hue = "flat_type_label", palette='colorblind', data=df); 
In [67]:
plt.figure(figsize=(15,10))
sns.histplot(data=df, x="resale_price", color="green", kde=True, bins=100)
plt.show();
In [68]:
plt.figure(figsize=(15,10))
sns.scatterplot(x="floor_area_sqm", y="resale_price",data=df); 
In [69]:
p = figure(width=670, height=400, toolbar_location=None,
           title="Normal (Gaussian) Distribution")

n = 1000

rng = np.random.default_rng(825914)

x = rng.normal(loc=4.7, scale=14.3, size=n)

# Scale random data so that it has mean of 0 and standard deviation of 1

xbar = x.mean()

sigma = x.std()

scaled = (x - xbar) / sigma


# Histogram

bins = np.linspace(-3, 3, 40)

hist, edges = np.histogram(scaled, density=True, bins=bins)

p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
         fill_color="skyblue", line_color="white",
         legend_label=f"{n} random samples")

# Probability density function

x = np.linspace(-3.0, 3.0, 100)

pdf = np.exp(-0.5*x**2) / np.sqrt(2.0*np.pi)

p.line(x, pdf, line_width=2, line_color="navy",
       legend_label="Probability Density Function")

p.y_range.start = 0

p.xaxis.axis_label = "x"

p.yaxis.axis_label = "PDF(x)"

p.xaxis.ticker = [-3, -2, -1, 0, 1, 2, 3]

p.xaxis.major_label_overrides = {
    -3: TeX(r"\overline{x} - 3\sigma"),
    -2: TeX(r"\overline{x} - 2\sigma"),
    -1: TeX(r"\overline{x} - \sigma"),
     0: TeX(r"\overline{x}"),
     1: TeX(r"\overline{x} + \sigma"),
     2: TeX(r"\overline{x} + 2\sigma"),
     3: TeX(r"\overline{x} + 3\sigma"),}

p.yaxis.ticker = [0, 0.1, 0.2, 0.3, 0.4]

p.yaxis.major_label_overrides = {
    0: TeX(r"0"),
    0.1: TeX(r"0.1/\sigma"),
    0.2: TeX(r"0.2/\sigma"),
    0.3: TeX(r"0.3/\sigma"),
    0.4: TeX(r"0.4/\sigma"),}

div = Div(text=r"""
A histogram of a samples from a Normal (Gaussian) distribution, together with
the ideal probability density function, given by the equation:
<p />
$$
\qquad PDF(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left[-\frac{1}{2}
\left(\frac{x-\overline{x}}{\sigma}\right)^2 \right]
$$
""")

show(column(p, div))
Out[69]:
GlyphRenderer(
id = '1040', …)
coordinates = None,
data_source = ColumnDataSource(id='1036', ...),
glyph = Quad(id='1037', ...),
group = None,
hover_glyph = None,
js_event_callbacks = {},
js_property_callbacks = {},
level = 'glyph',
muted = False,
muted_glyph = Quad(id='1039', ...),
name = None,
nonselection_glyph = Quad(id='1038', ...),
selection_glyph = 'auto',
subscribed_events = [],
syncable = True,
tags = [],
view = CDSView(id='1041', ...),
visible = True,
x_range_name = 'default',
y_range_name = 'default')
Out[69]:
GlyphRenderer(
id = '1060', …)
coordinates = None,
data_source = ColumnDataSource(id='1056', ...),
glyph = Line(id='1057', ...),
group = None,
hover_glyph = None,
js_event_callbacks = {},
js_property_callbacks = {},
level = 'glyph',
muted = False,
muted_glyph = Line(id='1059', ...),
name = None,
nonselection_glyph = Line(id='1058', ...),
selection_glyph = 'auto',
subscribed_events = [],
syncable = True,
tags = [],
view = CDSView(id='1061', ...),
visible = True,
x_range_name = 'default',
y_range_name = 'default')
In [71]:
# Create the blank plot
# p = figure(plot_height = 600, plot_width = 600, 
#            title = 'Histogram of Arrival Delays',
#            x_axis_label = 'Delay (min)]', 
#            y_axis_label = 'Number of Flights')

# # Add a quad glyph
# p.quad(bottom=0, top=delays['flights'], 
#        left=delays['left'], right=delays['right'], 
#        fill_color='red', line_color='black')

# # Show the plot
# show(p)
In [72]:
corrMatrix = df.corr()
sns.heatmap(corrMatrix, annot=True)
plt.show()
Out[72]:
<AxesSubplot:>
In [73]:
df.corr()
Out[73]:
floor_area_sqm lease_commence_date resale_price storey_first storey_last storey_avg floor_area_sqft flat_type_label flat_model_label
floor_area_sqm 1.00 0.43 0.60 0.05 0.05 0.05 1.00 0.95 -0.41
lease_commence_date 0.43 1.00 0.52 0.15 0.15 0.15 0.43 0.45 -0.21
resale_price 0.60 0.52 1.00 0.23 0.23 0.23 0.60 0.63 -0.27
storey_first 0.05 0.15 0.23 1.00 1.00 1.00 0.05 0.08 -0.08
storey_last 0.05 0.15 0.23 1.00 1.00 1.00 0.05 0.08 -0.08
storey_avg 0.05 0.15 0.23 1.00 1.00 1.00 0.05 0.08 -0.08
floor_area_sqft 1.00 0.43 0.60 0.05 0.05 0.05 1.00 0.95 -0.41
flat_type_label 0.95 0.45 0.63 0.08 0.08 0.08 0.95 1.00 -0.37
flat_model_label -0.41 -0.21 -0.27 -0.08 -0.08 -0.08 -0.41 -0.37 1.00
# Metadata for Resale Flat Prices
---
Identifier: 7a339d20-3c57-4b11-a695-9348adfd7614
Name: resale-flat-prices
Title: Resale Flat Prices
Description:
  - Resale transacted prices.
  - Prior to March 2012, data is based on date of approval for the resale transactions.
  - For March 2012 onwards, the data is based on date of registration for the resale
    transactions.
Topics:
  - Infrastructure
Keywords:
  - Cost of Living
  - HDB
  - Housing
  - Property
  - Public Housing
  - Resale Flats
Publisher:
  Name: Housing and Development Board
  Admin 1:
    Name: Lau Pei Wen
    Department: HDB
    Email: LAU_Pei_Wen@hdb.gov.sg
  Admin 2:
    Name: Toh Xue Qin
    Department: HDB
    Email: toh_xue_qin@hdb.gov.sg
Sources:
  - Housing and Development Board
License: https://data.gov.sg/open-data-licence
Frequency: Monthly
Coverage: 1990-01-01 to 2022-03-10
Last Updated: 2022-03-11T01:00:32.750114
Resources:
  -
    Identifier: f1765b54-a209-4718-8d38-a39237f502b3
    Title: Resale flat prices based on registration date from Jan-2017 onwards
    Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards-2022-03-11T01-00-31Z.csv
    Format: CSV
    Coverage: 2017-01-01 to 2022-03-10
    Last Updated: 2022-03-11T01:00:31.456350
    Schema:
      -
        Name: month
        Title: Month
        Type: datetime
        Sub Type: month
        Format: YYYY-MM
      -
        Name: town
        Title: Town
        Type: text
        Sub Type: general
      -
        Name: flat_type
        Title: Flat type
        Type: text
        Sub Type: general
      -
        Name: block
        Title: Block
        Type: text
        Sub Type: general
      -
        Name: street_name
        Title: Street name
        Type: text
        Sub Type: general
      -
        Name: storey_range
        Title: Storey range
        Type: text
        Sub Type: general
      -
        Name: floor_area_sqm
        Title: Floor area sqm
        Type: numeric
        Sub Type: general
        Unit Of Measure: sqm
      -
        Name: flat_model
        Title: Flat model
        Type: text
        Sub Type: general
      -
        Name: lease_commence_date
        Title: Lease commence date
        Type: datetime
        Sub Type: year
        Format: YYYY
      -
        Name: remaining_lease
        Title: Remaining lease
        Type: text
        Sub Type: general
      -
        Name: resale_price
        Title: Resale price
        Type: numeric
        Sub Type: general
        Unit Of Measure: $
  -
    Identifier: 1b702208-44bf-4829-b620-4615ee19b57c
    Title: Resale Flat Prices (Based on Registration Date), From Jan 2015 to Dec
      2016
    Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-registration-date-from-jan-2015-to-dec-2016-2019-06-17T09-03-16Z.csv
    Format: CSV
    Coverage: 2015-01-01 to 2016-12-31
    Description:
      - Notes:
      - 
      - 1. The approximate floor area includes any recess area purchased, space adding
        item under HDB’s upgrading programmes, roof terrace, etc.
      - 
      - 2. The transactions exclude resale transactions that may not reflect the
        full market price such as resale between relatives and resale of part shares.
      - 
      - 3. Resale prices should be taken as indicative only as the resale prices
        agreed between buyers and sellers are dependent on many factors.
      - 
      - 4. "Remaining lease" is the number of years left before the lease ends. This
        information is computed as at the resale flat application.
    Last Updated: 2019-06-17T09:03:16.240486
    Schema:
      -
        Name: month
        Title: Month
        Type: datetime
        Sub Type: month
        Format: YYYY-MM
      -
        Name: town
        Title: Town
        Type: text
        Sub Type: general
      -
        Name: flat_type
        Title: Flat Type
        Type: text
        Sub Type: general
      -
        Name: block
        Title: Block
        Type: text
        Sub Type: general
      -
        Name: street_name
        Title: Street Name
        Type: text
        Sub Type: general
      -
        Name: storey_range
        Title: Storey Range
        Type: text
        Sub Type: general
      -
        Name: floor_area_sqm
        Title: Floor Area
        Type: numeric
        Sub Type: general
        Unit Of Measure: Sqm
      -
        Name: flat_model
        Title: Flat Model
        Type: text
        Sub Type: general
      -
        Name: lease_commence_date
        Title: Lease Commencement Date
        Type: datetime
        Sub Type: year
        Format: YYYY
      -
        Name: remaining_lease
        Title: Remaining Lease
        Type: numeric
        Sub Type: general
        Unit Of Measure: Years
      -
        Name: resale_price
        Title: Resale Price
        Type: numeric
        Sub Type: general
        Unit Of Measure: $
  -
    Identifier: 83b2fc37-ce8c-4df4-968b-370fd818138b
    Title: Resale Flat Prices (Based on Registration Date), From Mar 2012 to Dec
      2014
    Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-registration-date-from-mar-2012-to-dec-2014-2019-06-17T09-04-34Z.csv
    Format: CSV
    Coverage: 2012-03-01 to 2014-12-31
    Description:
      - Notes:
      - 
      - 1. The approximate floor area includes any recess area purchased, space adding
        item under HDB’s upgrading programmes, roof terrace, etc.
      - 
      - 2. The transactions exclude resale transactions that may not reflect the
        full market price such as resale between relatives and resale of part shares.
      - 
      - 3. Resale prices should be taken as indicative only as the resale prices
        agreed between buyers and sellers are dependent on many factors.
    Last Updated: 2019-06-17T09:04:34.538580
    Schema:
      -
        Name: month
        Title: Month
        Type: datetime
        Sub Type: month
        Format: YYYY-MM
      -
        Name: town
        Title: Town
        Type: text
        Sub Type: general
      -
        Name: flat_type
        Title: Flat Type
        Type: text
        Sub Type: general
      -
        Name: block
        Title: Block
        Type: text
        Sub Type: general
      -
        Name: street_name
        Title: Street Name
        Type: text
        Sub Type: general
      -
        Name: storey_range
        Title: Storey Range
        Type: text
        Sub Type: general
      -
        Name: floor_area_sqm
        Title: Floor Area
        Type: numeric
        Sub Type: general
        Unit Of Measure: Sqm
      -
        Name: flat_model
        Title: Flat Model
        Type: text
        Sub Type: general
      -
        Name: lease_commence_date
        Title: Lease Commence Date
        Type: datetime
        Sub Type: year
        Format: YYYY
      -
        Name: resale_price
        Title: Resale Price
        Type: numeric
        Sub Type: general
        Unit Of Measure: S$
  -
    Identifier: 8c00bf08-9124-479e-aeca-7cc411d884c4
    Title: Resale Flat Prices (Based on Approval Date), 2000 - Feb 2012
    Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-approval-date-2000-feb-2012-2019-06-28T10-14-13Z.csv
    Format: CSV
    Coverage: 2000-01-01 to 2012-02-29
    Description:
      - Notes:
      - 
      - 1. The approximate floor area includes any recess area purchased, space adding
        item under HDB’s upgrading programmes, roof terrace, etc.
      - 
      - 2. The transactions exclude resale transactions that may not reflect the
        full market price such as resale between relatives and resale of part shares.
      - 
      - 3. Resale prices should be taken as indicative only as the resale prices
        agreed between buyers and sellers are dependent on many factors.
    Last Updated: 2019-06-28T10:14:13.202606
    Schema:
      -
        Name: month
        Title: Month
        Type: datetime
        Sub Type: month
        Format: YYYY-MM
      -
        Name: town
        Title: Town
        Type: text
        Sub Type: general
      -
        Name: flat_type
        Title: Flat Type
        Type: text
        Sub Type: general
      -
        Name: block
        Title: Block
        Type: text
        Sub Type: general
      -
        Name: street_name
        Title: Street Name
        Type: text
        Sub Type: general
      -
        Name: storey_range
        Title: Storey Range
        Type: text
        Sub Type: general
      -
        Name: floor_area_sqm
        Title: Floor Area
        Type: numeric
        Sub Type: general
        Unit Of Measure: Sqm
      -
        Name: flat_model
        Title: Flat Model
        Type: text
        Sub Type: general
      -
        Name: lease_commence_date
        Title: Lease Commence Date
        Type: datetime
        Sub Type: year
        Format: YYYY
      -
        Name: resale_price
        Title: Resale Price
        Type: numeric
        Sub Type: general
        Unit Of Measure: S$
  -
    Identifier: adbbddd3-30e2-445f-a123-29bee150a6fe
    Title: Resale Flat Prices (Based on Approval Date), 1990 - 1999
    Url: https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-approval-date-1990-1999-2021-05-25T02-49-29Z.csv
    Format: CSV
    Coverage: 1990-01-01 to 1999-12-31
    Description:
      - Notes:
      - 
      - 1. The approximate floor area includes any recess area purchased, space adding
        item under HDB’s upgrading programmes, roof terrace, etc.
      - 
      - 2. The transactions exclude resale transactions that may not reflect the
        full market price such as resale between relatives and resale of part shares.
      - 
      - 3. Resale prices should be taken as indicative only as the resale prices
        agreed between buyers and sellers are dependent on many factors.
    Last Updated: 2021-05-25T02:49:29.941885
    Schema:
      -
        Name: month
        Title: Month
        Type: datetime
        Sub Type: month
        Format: YYYY-MM
      -
        Name: town
        Title: Town
        Type: text
        Sub Type: general
      -
        Name: flat_type
        Title: Flat Type
        Type: text
        Sub Type: general
      -
        Name: block
        Title: Block
        Type: text
        Sub Type: general
      -
        Name: street_name
        Title: Street Name
        Type: text
        Sub Type: general
      -
        Name: storey_range
        Title: Storey Range
        Type: text
        Sub Type: general
      -
        Name: floor_area_sqm
        Title: Floor Area
        Type: numeric
        Sub Type: general
        Unit Of Measure: Sqm
      -
        Name: flat_model
        Title: Flat Model
        Type: text
        Sub Type: general
      -
        Name: lease_commence_date
        Title: Lease Commence Date
        Type: datetime
        Sub Type: year
        Format: YYYY
      -
        Name: resale_price
        Title: Resale Price
        Type: numeric
        Sub Type: general
        Unit Of Measure: S$
In [ ]:
# print('Named explicitly:')
# for name in glob.glob('/home/geeks/Desktop/gfg/data.txt'):
#     print(name)
  
# # Using '*' pattern 
# print('\nNamed with wildcard *:')
# for name in glob.glob('/home/geeks/Desktop/gfg/*'):
#     print(name)
  
# # Using '?' pattern
# print('\nNamed with wildcard ?:')
# for name in glob.glob('/home/geeks/Desktop/gfg/data?.txt'):
#     print(name)
  
# # Using [0-9] pattern
# print('\nNamed with wildcard ranges:')
# for name in glob.glob('/home/geeks/Desktop/gfg/*[0-9].*'):
#     print(name)    
# # Python program to find files
# # recursively using Python
    
# # Returns a list of names in list files.
# print("Using glob.glob()")
# files = glob.glob('/home/geeks/Desktop/gfg/**/*.txt', 
#                    recursive = True)
# for file in files:
#     print(file)  
  
# # It returns an iterator which will 
# # be printed simultaneously.
# print("\nUsing glob.iglob()")
# for filename in glob.iglob('/home/geeks/Desktop/gfg/**/*.txt',
#                            recursive = True):
#     print(filename)